基于搜索结果的聚类算法

doi:10.3969/j.issn.1006-2475.2012.11.010

计算机与现代化 ›› 2012, Vol. 1 ›› Issue (11): 35-38.doi: 10.3969/j.issn.1006-2475.2012.11.010

基于搜索结果的聚类算法

罗钊航，李旭伟

四川大学计算机学院，四川成都 610065

收稿日期:2012-07-13 修回日期:1900-01-01 出版日期:2012-11-10 发布日期:2012-11-10

Optimization of Search Results Based on Clustering Algorithm

LUO Zhao-hang, LI Xu-wei

College of Computer Science, Sichuan University, Chengdu 610065, China

Received:2012-07-13 Revised:1900-01-01 Online:2012-11-10 Published:2012-11-10

摘要/Abstract

摘要： 当前的搜索引擎中，存在大量的冗余搜索结果，且不能对搜索结果进行指导分类。本文提出一种基于密度的聚类算法，能够有效地对搜索结果进行聚类优化和分类。该算法选取搜索结果中权重高于一定值的网页，提取网页的特征值与候选关键字，标注特征范围，再进行网页相似度比较，最大限度地消除冗余网页，并根据网页的候选关键字提供分类，从而提高搜索结果的精准性和满意度，达到更智能的效果。

关键词: 基于密度的聚类算法, 网页相似度, 聚类, 冗余网页

Abstract: Nowadays there are many redundancy pages in results of search engine, and the results are not classified. An optimization algorithm of webpage search results based on an improved DBSCAN (density-based spatial clustering of applications with noise) algorithm is proposed and effective to cluster and classify the results. The algorithm selects the webpages with search weights above a certain value from all search results, then it extracts the eigenvalue of pages and candidate keys, compares the pages similarity to maximize the elimination of duplication and redundancy pages. Meanwhile, classifications are provided in accordance with the candidate keys of pages, thereby the precision and satisfaction of search engine could be improved with the effect of more intelligence.

Key words: DBSCAN algorithm, page similarity, clustering, redundancy page

中图分类号:

TP391

罗钊航;李旭伟. 基于搜索结果的聚类算法[J]. 计算机与现代化, 2012, 1(11): 35-38.

LUO Zhao-hang;LI Xu-wei. Optimization of Search Results Based on Clustering Algorithm[J]. Computer and Modernization, 2012, 1(11): 35-38.

[1]	吕美静1, 年梅1, 张俊1, 2, 付鲁森1. 基于自编码器的网络流量异常检测[J]. 计算机与现代化, 2024, 0(12): 40-44.
[2]	刘文亮1, 吴飞1, 何德明1, 赵维伟2, 潘建宏3. 基于相异度矩阵的碎片化回复文本聚类方法[J]. 计算机与现代化, 2024, 0(09): 56-60.
[3]	袁红伟1, 常利军1, 郝家欢2, 樊娜2, 王超2, 罗闯2, 张泽辉2. 基于标签传播的轨迹兴趣点挖掘及隐私保护[J]. 计算机与现代化, 2024, 0(05): 46-54.
[4]	敖博超, 范冰冰. 基于AP聚类算法的联邦学习聚合算法[J]. 计算机与现代化, 2024, 0(04): 5-11.
[5]	孟雅蕾1, 师红宇1, 王予2. 一种无阻流量预测方法[J]. 计算机与现代化, 2024, 0(04): 33-37.
[6]	曾钟静昕, 甘刚. 基于卷积自编码器的侧信道分析[J]. 计算机与现代化, 2024, 0(03): 110-114.
[7]	王秋忆, 周浩, 郑婷婷. 改进RetinaNet的电力设备目标检测方法[J]. 计算机与现代化, 2024, 0(01): 47-52.
[8]	王宏杰, 徐胜超. 基于希尔伯特相似度的云平台异常传输数据聚类方法[J]. 计算机与现代化, 2023, 0(09): 27-31.
[9]	韩雪. 基于约束聚类和粒子群算法的多路径规划[J]. 计算机与现代化, 2023, 0(08): 7-11.
[10]	孙子雨, 任燃, 魏曦哲. 基于DTW-TCN的股票分类及预测研究[J]. 计算机与现代化, 2023, 0(08): 31-37.
[11]	王艺成, 张国良, 张自杰, . 基于改进YOLOv5的小目标检测方法[J]. 计算机与现代化, 2023, 0(05): 100-105.
[12]	马瑜涓, 韩建宁, 史韶杰, 曹尚斌, 杨志秀. 基于HMRF的改进Kmeans脑肿瘤分割算法[J]. 计算机与现代化, 2023, 0(03): 1-5.
[13]	洪涛, 朱鹏宇, 郭波, 王敬宇. 基于半监督聚类的通信缺陷研判知识库构建及迭代技术[J]. 计算机与现代化, 2023, 0(02): 28-33.
[14]	刘兴建, 杨晓夫, 胡磊. 基于非负矩阵分解的半监督模型用于多层网络聚类[J]. 计算机与现代化, 2023, 0(02): 83-88.
[15]	文紫鑫, 李少英, 王斌成, 刘博, . 基于近邻关系聚合的人脸聚类方法[J]. 计算机与现代化, 2022, 0(12): 81-87.

基于搜索结果的聚类算法

Optimization of Search Results Based on Clustering Algorithm

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价